Model Selection

Low-latency inference

# Low-latency inference

Phi Mini MoE Instruct GGUF

Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model suitable for English business and research scenarios, excelling in resource-constrained environments and low-latency scenarios.

Large Language Model English

Sarvam Finetune

This is a transformers model published on Hub. The specific functions and detailed information are to be supplemented.

Large Language Model

Unlearn Tofu Llama 3.2 1B Instruct Forget10 SimNPO Lr1e 05 B4.5 A1 D0 G0.25 Ep5

This is a transformers model that has been uploaded to the Hugging Face Hub. Specific information is to be supplemented.

Large Language Model

open-unlearning

Qwen3 14b Ug40 Pretrained

This is an automatically generated transformers model card, lacking specific model information.

Large Language Model

This is a transformers model published on the Hugging Face Hub, and specific information is to be supplemented.

Large Language Model

Mistral Small 3.1 24B Instruct 2503 Quantized.w8a8

This is an INT8-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized by Red Hat and Neural Magic, suitable for fast response and low-latency scenarios.

Safetensors Supports Multiple Languages

Mistral Small 3.1 24B Instruct 2503 FP8 Dynamic

This is a 24B-parameter conditional generation model based on the Mistral3 architecture, optimized with FP8 dynamic quantization, suitable for multilingual text generation and visual understanding tasks.

Safetensors Supports Multiple Languages

Mistral Small 3.1 24B Instruct 2503

Mistral Small 3.1 is a large multimodal language model with 24 billion parameters, possessing visual understanding ability and 128k long context processing ability, suitable for various tasks.

Image-to-Text Supports Multiple Languages

Sana Sprint 1.6B 1024px

SANA-Sprint is an ultra-efficient text-to-image diffusion model that reduces inference steps from 20 to 1-4 while maintaining top-tier performance.

Image Generation Supports Multiple Languages

Efficient-Large-Model

Canary 1b Flash

NVIDIA NeMo Canary Flash is a family of multilingual multitask models that achieves state-of-the-art performance across multiple speech benchmarks. Supports automatic speech recognition and translation tasks in four languages.

Speech Recognition Supports Multiple Languages

Mistral Small 24B Instruct 2501 Quantized.w8a8

A Mistral instruction fine-tuned model with 24B parameters after INT8 quantization, significantly reducing GPU memory requirements and improving computational throughput.

Large Language Model

Safetensors Supports Multiple Languages

Phi 4 Multimodal Instruct

Phi-4-multimodal-instruct is a lightweight open-source multimodal foundation model that integrates language, vision, and speech research and datasets from Phi-3.5 and 4.0 models. It supports text, image, and audio inputs to generate text outputs, with a context length of 128K tokens.

Multimodal Fusion

Transformers Supports Multiple Languages

A fast and accurate neural machine translation model for Chinese to English translation

Machine Translation Supports Multiple Languages

Whisper Large V3 Distil Multi7 V0.2

A distilled multilingual Whisper model supporting automatic speech recognition for 7 European languages with code-switching capability

Speech Recognition

Transformers Supports Multiple Languages

Bart Large Mnli Openvino

This is the OpenVINO optimized version of the facebook/bart-large-mnli model for zero-shot text classification tasks.

Text Classification

Vectorizer.guava

A vectorization tool developed by Sinequa that generates embedding vectors from input paragraphs or queries for sentence similarity calculation and retrieval tasks.

Text Embedding Supports Multiple Languages

Kotoba Whisper V2.0

Kotoba-Whisper is a Japanese automatic speech recognition distilled model developed by Asahi Ushio in collaboration with Kotoba Technologies, based on Whisper large-v3 distillation, achieving a 6.3x inference speed improvement.

Speech Recognition

Transformers Japanese

Show-o is an any-to-any conversion model based on PyTorch, supporting input and output conversion across multiple modalities.

Zamba2-2.7B is a hybrid model composed of state space and Transformer modules, using the Mamba2 module and shared attention module, featuring high performance and low latency.

Large Language Model

Snowflake Arctic Embed M V1.5

Snowflake Arctic Embed M v1.5 is an efficient sentence embedding model, focusing on sentence similarity calculation and feature extraction tasks.

Mobileclip B LT OpenCLIP

MobileCLIP-B (LT) is an efficient image-text model developed by Apple, achieving fast zero-shot image classification through multimodal reinforcement training, outperforming similar models.

Mobileclip B OpenCLIP

MobileCLIP-B is an efficient image-text model that achieves fast inference through multimodal reinforcement training and excels in zero-shot image classification tasks.

Mobileclip S2 OpenCLIP

MobileCLIP-S2 is an efficient text-image model that achieves fast zero-shot image classification through multimodal reinforcement training.

Mobileclip S0 Timm

MobileCLIP-S0 is an efficient image-text model achieved through multimodal reinforcement training, significantly improving speed and size efficiency while maintaining high performance.

Llm Compiler 7b Ftd

The LLM Compiler is a state-of-the-art LLM based on Code Llama, specifically designed for code optimization and compiler inference tasks. It far exceeds existing public models in understanding compiler optimization.

Large Language Model

Kotoba Whisper V1.1

Kotoba-Whisper-v1.1 is a Japanese automatic speech recognition model based on Whisper, with added punctuation and timestamp processing capabilities.

Speech Recognition

Transformers Japanese

Meta Llama 3 8B Instruct Function Calling

This is a Llama 3 instruction model fine-tuned for function calling, suitable for commercial use under the Llama 3 Community License.

Large Language Model

Transformers English

Kotoba Whisper V1.0

Kotoba-Whisper is a Japanese automatic speech recognition distilled Whisper model collection jointly developed by Asahi Ushio and Kotoba Technologies, which is 6.3 times faster than the original large-v3 while maintaining similar low error rates.

Speech Recognition

Transformers Japanese

Mamba is an efficient language model based on the State Space Model (SSM), with the ability to model sequences with linear time complexity.

Large Language Model

Codellama 70B Python GPTQ

CodeLlama 70B Python is a large language model focused on the Python programming language, based on the Llama 2 architecture, optimized for code generation and completion tasks.

Large Language Model

Transformers Other

Codellama 70B Instruct GGUF

CodeLlama 70B Instruct is a large-scale code generation model based on the Llama 2 architecture, specifically optimized for code understanding and generation tasks.

Large Language Model Other

Yi Ko 6b Text2sql

This is a transformers model published on the Hugging Face Hub. The specific functions and features are to be supplemented.

Large Language Model

MobileVLM is a fast and powerful multi-modal vision-language model designed specifically for mobile devices, supporting efficient cross-modal interaction.

Faster Whisper Base.en

This is a Whisper base.en model converted based on CTranslate2, used for English speech recognition tasks.

Speech Recognition English

Faster Whisper Medium.en

This is the CTranslate2 converted version of the OpenAI Whisper medium.en model, used for efficient automatic speech recognition tasks.

Speech Recognition English

Faster Whisper Large V3

Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.

Speech Recognition Supports Multiple Languages

Vectorizer.vanilla

A vectorizer developed by Sinequa that generates embedding vectors from input paragraphs or queries for sentence similarity computation and retrieval tasks.

Transformers English

Vectorizer V1 S Multilingual

A multilingual vectorizer developed by Sinequa that generates embedding vectors for input paragraphs or queries, used for similarity calculation and information retrieval.

Transformers Supports Multiple Languages

Vectorizer V1 S En

A vectorizer developed by Sinequa capable of generating embedding vectors from paragraphs or queries for sentence similarity computation and feature extraction.

Transformers English

Stt Kr Conformer Ctc Medium

Korean automatic speech recognition model based on Conformer architecture, optimized for stream processing with excellent performance in specific domains like customer service voice

Speech Recognition Korean

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase